Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 32.4 KiB |
| Average record size in memory | 33.1 B |
Variable types
| Numeric | 3 |
|---|---|
| Boolean | 1 |
| Categorical | 1 |
df_index has unique values | Unique |
interestedPartyStatementID has unique values | Unique |
subjectStatementID has unique values | Unique |
Reproduction
| Analysis started | 2022-06-01 21:24:14.221596 |
|---|---|
| Analysis finished | 2022-06-01 21:25:05.660699 |
| Duration | 51.44 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2805.055 |
| Minimum | 9 |
|---|---|
| Maximum | 5558 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 263.85 |
| Q1 | 1420.5 |
| median | 2768 |
| Q3 | 4217.5 |
| 95-th percentile | 5361.2 |
| Maximum | 5558 |
| Range | 5549 |
| Interquartile range (IQR) | 2797 |
Descriptive statistics
| Standard deviation | 1633.179139 |
|---|---|
| Coefficient of variation (CV) | 0.582227136 |
| Kurtosis | -1.20629847 |
| Mean | 2805.055 |
| Median Absolute Deviation (MAD) | 1412.5 |
| Skewness | -0.01646770117 |
| Sum | 2805055 |
| Variance | 2667274.1 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3601 | 1 | 0.1% |
| 1946 | 1 | 0.1% |
| 3708 | 1 | 0.1% |
| 1133 | 1 | 0.1% |
| 4063 | 1 | 0.1% |
| 5479 | 1 | 0.1% |
| 5429 | 1 | 0.1% |
| 4125 | 1 | 0.1% |
| 3450 | 1 | 0.1% |
| 2928 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 9 | 1 | |
| 13 | 1 | |
| 19 | 1 | |
| 28 | 1 | |
| 30 | 1 | |
| 32 | 1 | |
| 34 | 1 | |
| 37 | 1 | |
| 39 | 1 | |
| 51 | 1 |
| Value | Count | Frequency (%) |
| 5558 | 1 | |
| 5557 | 1 | |
| 5554 | 1 | |
| 5548 | 1 | |
| 5539 | 1 | |
| 5538 | 1 | |
| 5536 | 1 | |
| 5532 | 1 | |
| 5529 | 1 | |
| 5525 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.064471042 × 1018 |
| Minimum | 9.96661279 × 1015 |
|---|---|
| Maximum | 1.841131821 × 1019 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 9.96661279 × 1015 |
|---|---|
| 5-th percentile | 9.643683988 × 1017 |
| Q1 | 4.312639068 × 1018 |
| median | 8.82941965 × 1018 |
| Q3 | 1.374430733 × 1019 |
| 95-th percentile | 1.760085101 × 1019 |
| Maximum | 1.841131821 × 1019 |
| Range | 1.84013516 × 1019 |
| Interquartile range (IQR) | 9.431668264 × 1018 |
Descriptive statistics
| Standard deviation | 5.28530985 × 1018 |
|---|---|
| Coefficient of variation (CV) | 0.5830797876 |
| Kurtosis | -1.19127615 |
| Mean | 9.064471042 × 1018 |
| Median Absolute Deviation (MAD) | 4.736985735 × 1018 |
| Skewness | 0.03800593563 |
| Sum | 9.064471042 × 1021 |
| Variance | 2.793450021 × 1037 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.039954664 × 1018 | 1 | 0.1% |
| 1.690565218 × 1019 | 1 | 0.1% |
| 8.428473491 × 1018 | 1 | 0.1% |
| 7.51977738 × 1018 | 1 | 0.1% |
| 9.050788094 × 1018 | 1 | 0.1% |
| 1.541499861 × 1018 | 1 | 0.1% |
| 1.322564417 × 1019 | 1 | 0.1% |
| 2.935879797 × 1018 | 1 | 0.1% |
| 7.860754785 × 1017 | 1 | 0.1% |
| 1.320258748 × 1019 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 9.96661279 × 1015 | 1 | |
| 5.059297737 × 1016 | 1 | |
| 5.709866048 × 1016 | 1 | |
| 1.003764656 × 1017 | 1 | |
| 1.106155461 × 1017 | 1 | |
| 1.186848996 × 1017 | 1 | |
| 1.47899958 × 1017 | 1 | |
| 2.05819899 × 1017 | 1 | |
| 2.407618482 × 1017 | 1 | |
| 2.532792532 × 1017 | 1 |
| Value | Count | Frequency (%) |
| 1.841131821 × 1019 | 1 | |
| 1.840987267 × 1019 | 1 | |
| 1.834842509 × 1019 | 1 | |
| 1.834243925 × 1019 | 1 | |
| 1.833862702 × 1019 | 1 | |
| 1.833233604 × 1019 | 1 | |
| 1.832391368 × 1019 | 1 | |
| 1.831339655 × 1019 | 1 | |
| 1.830624871 × 1019 | 1 | |
| 1.83037944 × 1019 | 1 |
interestedPartyIsPerson
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False | 88 |
| Value | Count | Frequency (%) |
| True | 912 | |
| False | 88 | 8.8% |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.197044349 × 1018 |
| Minimum | 1.048368765 × 1016 |
|---|---|
| Maximum | 1.844161125 × 1019 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1.048368765 × 1016 |
|---|---|
| 5-th percentile | 8.59627042 × 1017 |
| Q1 | 4.327400042 × 1018 |
| median | 9.177306067 × 1018 |
| Q3 | 1.370790338 × 1019 |
| 95-th percentile | 1.758487481 × 1019 |
| Maximum | 1.844161125 × 1019 |
| Range | 1.843112756 × 1019 |
| Interquartile range (IQR) | 9.380503337 × 1018 |
Descriptive statistics
| Standard deviation | 5.387595119 × 1018 |
|---|---|
| Coefficient of variation (CV) | 0.5857963618 |
| Kurtosis | -1.240904045 |
| Mean | 9.197044349 × 1018 |
| Median Absolute Deviation (MAD) | 4.681183506 × 1018 |
| Skewness | 0.009705259091 |
| Sum | 9.197044349 × 1021 |
| Variance | 2.902618117 × 1037 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.352645712 × 1018 | 1 | 0.1% |
| 4.302773373 × 1018 | 1 | 0.1% |
| 1.530775994 × 1019 | 1 | 0.1% |
| 2.115959212 × 1018 | 1 | 0.1% |
| 4.941679117 × 1018 | 1 | 0.1% |
| 1.297618993 × 1019 | 1 | 0.1% |
| 6.049492525 × 1018 | 1 | 0.1% |
| 1.145302343 × 1019 | 1 | 0.1% |
| 1.570640351 × 1019 | 1 | 0.1% |
| 1.140430196 × 1019 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 1.048368765 × 1016 | 1 | |
| 3.59029514 × 1016 | 1 | |
| 4.31121121 × 1016 | 1 | |
| 5.099757031 × 1016 | 1 | |
| 7.032438087 × 1016 | 1 | |
| 7.142126181 × 1016 | 1 | |
| 1.011228621 × 1017 | 1 | |
| 1.395670208 × 1017 | 1 | |
| 1.518917238 × 1017 | 1 | |
| 1.742856748 × 1017 | 1 |
| Value | Count | Frequency (%) |
| 1.844161125 × 1019 | 1 | |
| 1.841474472 × 1019 | 1 | |
| 1.839014401 × 1019 | 1 | |
| 1.838610119 × 1019 | 1 | |
| 1.836635891 × 1019 | 1 | |
| 1.836011364 × 1019 | 1 | |
| 1.835768645 × 1019 | 1 | |
| 1.835643299 × 1019 | 1 | |
| 1.833870042 × 1019 | 1 | |
| 1.832686034 × 1019 | 1 |
minimumShare
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 75.0 | |
|---|---|
| 25.0 | |
| 50.0 | 44 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 4000 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 25.0 |
|---|---|
| 2nd row | 50.0 |
| 3rd row | 25.0 |
| 4th row | 25.0 |
| 5th row | 75.0 |
Common Values
| Value | Count | Frequency (%) |
| 75.0 | 567 | |
| 25.0 | 389 | |
| 50.0 | 44 | 4.4% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 75.0 | 567 | |
| 25.0 | 389 | |
| 50.0 | 44 | 4.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1044 | |
| 5 | 1000 | |
| . | 1000 | |
| 7 | 567 | |
| 2 | 389 | 9.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3000 | |
| Other Punctuation | 1000 | 25.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1044 | |
| 5 | 1000 | |
| 7 | 567 | |
| 2 | 389 | 13.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1044 | |
| 5 | 1000 | |
| . | 1000 | |
| 7 | 567 | |
| 2 | 389 | 9.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1044 | |
| 5 | 1000 | |
| . | 1000 | |
| 7 | 567 | |
| 2 | 389 | 9.7% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | interestedPartyStatementID | interestedPartyIsPerson | subjectStatementID | minimumShare | |
|---|---|---|---|---|---|
| 0 | 3601 | 2039954664013561264 | True | 2352645711550436598 | 25.0 |
| 1 | 2616 | 9027927498965800066 | True | 13616070852555728151 | 50.0 |
| 2 | 224 | 5479055635223948296 | True | 6658248959774506431 | 25.0 |
| 3 | 4693 | 10685402616317734906 | True | 10860295220757086475 | 25.0 |
| 4 | 576 | 14566038486263607917 | True | 16473631643407030477 | 75.0 |
| 5 | 1619 | 1179696533967079481 | True | 16749046166727147573 | 50.0 |
| 6 | 152 | 4830461716565545927 | True | 14388744761212781696 | 75.0 |
| 7 | 5077 | 1481259068514272500 | True | 3919470503037606630 | 25.0 |
| 8 | 3632 | 4923272178238455813 | True | 2133454994847492688 | 25.0 |
| 9 | 5230 | 10894855595172079268 | True | 12204382389873689690 | 25.0 |
Last rows
| df_index | interestedPartyStatementID | interestedPartyIsPerson | subjectStatementID | minimumShare | |
|---|---|---|---|---|---|
| 990 | 1183 | 8657544021888051606 | False | 18220162528667112358 | 75.0 |
| 991 | 4098 | 14773808248550862599 | True | 2226027182400801943 | 25.0 |
| 992 | 4962 | 5745977813030369120 | True | 2594058580586906694 | 25.0 |
| 993 | 5520 | 713813154471278157 | True | 17043481889052088105 | 75.0 |
| 994 | 1545 | 13508469925370910641 | True | 7420618232915944459 | 75.0 |
| 995 | 4535 | 9120463668042900967 | True | 10300662727225129417 | 25.0 |
| 996 | 217 | 13263478187579880481 | True | 5642758813186647222 | 75.0 |
| 997 | 3835 | 9950833446970954170 | True | 12270776901372714358 | 50.0 |
| 998 | 54 | 14140672557023582339 | True | 17740914278808282865 | 25.0 |
| 999 | 4389 | 4697176215894684290 | True | 17311745862895944506 | 75.0 |